Topic Oriented Semi-supervised Document Clustering

نویسندگان

  • Jiangtao Qiu
  • Changjie Tang
چکیده

In our study on developing a text mining prototype system, it is needed to group documents according to author’s need. However, Traditional documents clustering are usually considered an unsupervised learning. It cannot effectively group documents under user’s need. To solve this problem, we propose a new documents clustering approach. The main contributions include: (1) Describes user’s need by using multiple-attributes topic; (2) Proposes a topic-semantic annotation algorithm; (3) Proposes an optimizing hierarchical clustering algorithm to find out the best clustering solution on clustering tree by using criterion function. Experiments have validated feasibility and effectiveness of the new approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-Supervised Co-Clustering for Query-Oriented Theme-based Summarization

Sentence clustering plays an important role in theme-based summarization which aims to discover the topical themes defined as the clusters of highly related sentences. However, due to the short length of sentences, the word-vector cosine similarity traditionally used for document clustering is no longer suitable. To alleviate this problem, we regard a word as an independent text object rather t...

متن کامل

From Topic Models to Semi-supervised Learning: Biasing Mixed-Membership Models to Exploit Topic-Indicative Features in Entity Clustering

We present methods to introduce different forms of supervision into mixed-membership latent variable models. Firstly, we introduce a technique to bias the models to exploit topic-indicative features, i.e. features which are apriori known to be good indicators of the latent topics that generated them. Next, we present methods to modify the Gibbs sampler used for approximate inference in such mod...

متن کامل

A Semi-supervised Topic-Driven Approach for Clustering Textual Answers to Survey Questions

We propose an algorithm to effectively cluster a specific type of text documents: textual responses gathered through a survey system. Due to the peculiar features exhibited in such responses (e.g., short in length, rich in outliers, and diverse in categories), traditional unsupervised and semisupervised clustering techniques are challenged to achieve satisfactory performance as demanded by a su...

متن کامل

User-Interest-Based Document Filtering via Semi-supervised Clustering

This paper studies the task of user-interest-based document filtering, where users target to find some documents of a specific topic among a large document collection. This is usually done by a text categorization process, which divides all the documents into two categorizes: one containing all the desired documents (called positive documents) and the other containing all the other documents (c...

متن کامل

Composite Kernel Optimization in Semi-Supervised Metric

Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007